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Abstract. The subdifferential of convex functions of the singular spectrum of real matrices 
has been widely studied in matrix analysis, optimization and automatic control theory. 
Convex analysis and optimization over spaces of tensors is now gaining much interest due 
to its potential applications to signal processing, statistics and engineering. The goal of this 
paper is to present an applications to the problem of low rank tensor recovery based on 
linear random measurement by extending the results of Tropp [6] to the tensors setting. 


1 Introduction 

1.1 Background 

Tensors have been recently a subject of great interest in the applied mathematics community. 
We refer to |3I4) for a modern reference on this subject. Many applications of tensors are based 
on solving tensor related optimization problems, such as minimizing certain norms under linear 
constraints. Such problems have been recently successfully addressed in the 2D setting, i.e. for 
matrices, by the statistics, signal processing, inverse problems and automatic control communities 
in particular. Two of the reasons for this rapid growth of interest in the application of matrix 
norms to penalized estimation problems is that some norms promote spectral sparsity and that 
much work had been done in the fields of matrix analysis and convex analysis to analyze the 
subdifferential of such norms; see for example [7] and [S]. Our goal in the present paper is to 
extend previous results on matrix norms to the tensor setting. In particular, we propose a general 
study of the subdifferential of certain convex functions of the spectrum of real tensors and apply 
our results to the computation of the subdifferential of useful and natural matrix norms. We also 
present an application of our formulas to the problem of low rank tensor recovery using sparsity 
promoting norm minimization under random linear constraints, a natural extension of previous 
works by Tropp [5]. 

1.2 Notations 

For any convex function / : R" i—>■ RU {+oo}, the conjugate function /* associated to / is defined 
by 

/*( 5 )=''sup {g,x)-f{x). 

The subdifferential of / at a; G R" is defined by 

df =*' {5 e R” I Vy, G R” fiy) > fix) + (y, y - x)} . 

Moreover, it is well known (see e.g. 0) that g G df{x) if and only if 

fix) + fig) = ig,x). 

In the present paper, a tensor represented by a multi-dimensional array in Let 

D and ni,...,nD be positive integers. Let X G R^ix-'-xno denote a D-dimensional tensor. If 
ni = • • • = riD, then we say that X is cubic. The set of D-mode cubic tensors will be denoted by 
R"^'"X"-, where D will stay implicit. For any index set C C {1,... ,ni} x • • • x {1,... ,n£)}, Xc 
will denote the subarray (d:ii,....iD)(zi.....iD)GC' 


2 Basics on tensors 


2.1 Tensor norms 

The spectrum of a tensor Let us define the spectrum as the mapping which to any tensor 
T S associates the vector a{X) given by 

where denotes the vector consisting of the singular values of the mode-d matricization of 

T. 

Norms of tensors Let X = (Xijk) and y = (yijk) be tensors in define 

several tensor norms on R^ix-'-xno ^pj^g ^^^g ^ natural extension of the Frobenius norm or 

Hilbert-Schmidt norm from matrices to tensors. We start by defining the following scalar product 
on K"ix-xnD. 

ni no 

n=i *0=1 

Using this scalar product, we can define the following norm, which we call the Frobenius norm 

One may also define an “operator norm” in the same manner as for matrices as follows 

||T|| max 

||uW ||2 = 1,<1=1 D 

We also define 

^ d=l 


2.2 Orthogonally decomposable tensors 

The Orthogonally decomposable (ODBC) tensors are defined as follows 
Definition 2.1 Let X be a tensor in R^-ix-'-xn^ _ jj: 

r 

X = '^ai ■ ^ (2.1) 

i=l 

where r ^ ni A ■ ■ ■ Ano, oi ^ ^ ctr >0 and ...,is a family of orthonormal vectors 

for d = 1,..., D, then we say h2.1\) is an orthogonal decomposition of X. 

Denote a = (oi,..., 0,..., 0) in . For each d € {1,...,D}, we may complete 

..., with ..., wlf]} so that matrix ..., ulf}) G M.^dxnd jg orthogo¬ 

nal. Using C/(i),...,t/(^), we may write G3D as 

T = V{a) X 1 X 2 ■■■Xd . (2.2) 

where T) = diag(a) is a diagonal tensor with the ith diagonal being ai for i = 1,..., r and the 
other diagonal entries being zero. Note that representation (12.21) is generally not unique unless 
ni = • • • = n£) and oi,..., are all distinct. 

It is easy to calculate the norms of ODBC tensors. 



Proposition 2.2 Let X be an orthogonally deeomposable tensor and let 


A' = ^ 0 • • • 0 u[^\ 

be an orthogonal decomposition of X. Then 

r 

Ill’ll =q;i and \\X\\^^'^a^. 

3 Further results on the spectrum 

In this section, we will present some further results on the spectrum such as the question of 
characterizing the image of the spectrum and the subdifferential of a function of the spectrum. 


3.1 A technical prerequisite: Von Neumann’s inequality for tensors 

Von Neumann’s inequality says that for any two matrices A and Y in we have 

{X,Y)<{a{X),a{Y)), 

with equality when the singular vectors of X and Y are equal, up to permutations when the 
singular values have multiplicity greater than one. This result has proved useful for the study of 
the subdifferential of unitarily invariant convex functions of the spectrum in the matrix case in 
[5]. In order to study the subdifferential of the norms of certain type of tensors, we will need a 
generalization this result to higher orders. This was worked out in [T]. Let us recall the containt 
of the main result of [T]. 

Definition 3.1 We say that a tensor S is blockwise decomposable if there exists an integer B 
and if, for all d = 1,..., D, there exists a partition U ... U into disjoint index subsets of 
{!,..., Ud}, such that = 0 if for all b = 1,..., B, (ii, ..., iu) ^ x ... x . 

An illustration of this block decomposition can be found in Figure [T] The following result is a 
generalization of von Neumann’s inequality from matrices to tensors. It is proved in [T]. 

Theorem 3.2 Let X,y be tensors. Then for all d = 1,..., D, we have 

{X,y) (3.3) 

Equality in h3.S\) holds simultaneously for all d = 1,..., D if and only if there exist orthogonal 
matrices G for d= 1,... ,D and tensors 'D{X),'D{y) G gy^^h that 

A = V{X) xi ••• XdW^°\ 

y = V{y) ■■■XdW^°\ 

where 'D{X) and P(V) satisfy the following properties: 

(i) 21(A) and 21(V) are block-wise decomposable with the same number of blocks, which we will 
denote by B, 

(ii) the blocks {'Dd{X)}},=i,...,B (resp. {'T>i,(y)}h=i,...,B) on the diagonal of'D{X) (resp. T>{y)) have 
the same sizes, 

(Hi) for each b — \,... ,B the two blocks 21{,(A) and 'Dh{y) are proportional. 



Fig. 1. A block-wise diagonal tensor. 


3.2 Subdifferential for ODEC tensors 


Theorem 3.3 Let f : M” x • • • x M" i—>■ R satisfy property 

/(si, ■ • ■ , ■ 7 (^■^) 

for all T G &s- Then for all ODEC tensors X, we have 

{foanX) = naiX)) ( 3 . 5 ) 


Using this result combined with von Neumann’s inequality for tensors, one easily obtains the 
following corollary. 

Corollary 3.4 Let f : R" x • • • x R” i—>■ R satisfy property 


f{siy . . . , S£t) ) ^r{D)) 


( 3 . 6 ) 


for all T G 6s- Let X be an ODEC tensor. Then necessary and sufficient conditions for an ODEC 
tensor y to belong to d{f o a){X) are 

1. y has the same mode-d singular spaces as X for all d = 1, ..., D, 

2. a{y)Gdf{a{X)). 

Corollary 3.5 Let X = D{a) Xi X 2 - • • be an ODEC tensor. Then the subdifferential 

(9|| • ||*(A’) includes the following set 


n = {^(l) Xi X2 • • • xd -f V 


||V|| < 1, Vx,U«T = o, z = l,...,u}. 


4 Application to tensor recovery with gaussian measurements 

Let G R"iX" 2 xn 3 unknown true signal, ^(•) : ]^"ixn 2 xn 3 |^g ^ known linear 

measurement mapping and 


y = d’(X#)+( 


( 4 . 7 ) 


be a noised vector of measurements in R'". 




We focus on the following optimization problem: 


nun Ill’ll* subject to ||^(■^) — y\\ ^ y. 


(4.8) 


Let X be any solution of optimization problem (14.81) . We are interested in giving a bound for 

\\X-X*\\f. 

The main tool of this section is the following result by Tropp [5]: 

Theorem 4.1 Assume that ||^|| ^ 77 . Then with probability at least 1 — we have 


\\X-X*\\f ^ 


277 


[Vm- 1 - w{Si{\\ ■ II*, T#)) -t]+' 


where [a]+ = max{a, 0} for any a G ffi.. 

The quantity w{^{\\ ■ ||*,d:’'^)) denotes the conic Gaussian width w(-) of the descent cone ^(|| • 
II*, The definitions of these notions are given as follows: 

Definition 4.2 Let K G be a cone, the conic Gaussian width w{K) is defined as 


w{K) = E[ sup {g,u)], 
ueKnS'^-^ 

where g ^ is a standard Gaussian vector and S'^~^ denotes the unit sphere in 

Definition 4.3 Let f : 1 —>■ K 6e a proper convex function. The descent cone ^{f,x) of the 

function f at a point x G is defined as 

^(/, x) {Au I A > 0 ,77 G R^^, f{x + u) ^ fix)}. 

According to Theorem 14.11 the error bound of \\X — dA^||i? depends on the conic Gaussian 
width w{-) of the descent cone .^(|| • ||*, A#). The following result reveals that the latter is then 
closely related to the subdifferential of || • ||* at A#. 

Proposition 4.4 Assume that 9||dA^|| is nonempty and does not contain the origin. Then 
w^i^iW ■ II*,T#)) ^ E inf dist^(a,Ta||A#|U), 

r^O 


where Q G is a tensor with i.i.d. random Gaussian entries and 

distF(^?,ra||A#||*) mf II^-J^IIf, 

y^rdWX^W^ 

i.e. the distance between Q and the set r9||d:’'^||*. 

To derive a bound for \\X — A^Hf, we need to give an upper bound for 

Einf dist^(Cl,ra||A#|U). 

The following result establishes such a bound in the case that A# is odec. 

Proposition 4.5 If X"^ is odec, then we have the following bound: 

E inf distF(f/, rSIlA^II*) < + r + 3 r( 7 ri + 772 + 773 — 3r) + r( 77 i 772 + 772773 + 771773 ) 

T^O 

-r^(r7i + 772 + 773). 




Proof. If is orthogonally decomposable, i.e. 


i=l 


= V{a) Xi X2 X3 C/(3), 


where I’(cr) is a diagonal tensor with diagonal elements a = (tri,..., ar) and = 
for j = 1, 2, 3, then the subdifferential 9|| • ||*(df^) includes the following set 


f2 = 




< 1, =0, i = 1,2,3. 


(4.9) 


Hence 


E inf dist^(t/,T9||df^||*) < E inf dist^(t/,rl7) = E inf inf ||t/ — 

r^O T^O T^OyGQ 

Note that V in (14.9p can also be characterized by 


v = rxic/j'^ X2C/|"^ X3c/f\ 


(4.10) 


where T S r)x(n 2 r)x(n3 r) jg ^ensor such that ||T|| ^ 1 and € ]g^nix(ni r) jg ^ matrix 

such that = (t/^*^ |t/j)^) is orthogonal for i = 1,2,3. In view of (14.91) and (14.101) . we assert that 
any y £ Q can be written as 

3^ = C xit/(i) X2l7(^) X3{7(3). 


where tensor C is block-wise diagonal with two diagonal blocks Ci = diag(l) G g^rxrxr 

C 2 = T G R(”i-’’)x(’i2-r)x(n3-r)^ 

Because Q G ]R”ixn 2 xn 3 jg ^ tensor with i.i.d. random standard Gaussian entries, for any 
orthogonal matrices with appropriate size, tensor 0 Xi X 2 X 3 

still has i.i.d. standard Gaussian entries. Therefore, we may choose a coordinate system such that 

E inf inf ||t/- = E inf inf \\Q - tC\\%, 

where 17 denotes the set of block-wise diagonal tensors with two diagonal blocks Cm = T>(1) G 
J^rxrxr g j^(„,x(„ 2 -r)x(na -0 verifying ||C 2 || < 1. Partitioning G in the same manner, 

we obtain 

2 

111/ — tC|||. = ||C/111 — tX>(1)|||. -I- 111/222 — tTIII- -f ^ WGiJ.kWp- 

i,j,k = l 

i,j,k are not equal 

Since ^ is a tensor with independent Gaussian entries, it follows that 
2 

ll^/ijfcllF = + ^2^3-I-nin3) - r^(ni-I-712-I-ns). 

i ,j,k=l 

i,j,k are not equal 


Thus 


E inf inf \\g 
-r^ocei? 


tC|| p = E inf inf 
^>0||C2||^l 



diag(T)||| -h 11^222 - tC2\\f'^ 


+r{nin2 + n2n3 -|- nins) — r^(ni -|- n2 -|- 773 ). 





Choosing r = ||t/ 2 ||, we get 


(ll^i “ diag(T)||| + 11 ^ 2 - rCalll) 
IIC2IKI V / 


<E||ei-diag(||e2||)||| 


Since 


E||t/i - diag(||C/ 2 ||)||F =r^ + rEHC/alP ^ + r + r(yni - r + ^/n 2 - r + y/n^ - 

< + r + 3r(ni + n 2 + ns - 3r), 


It follows that 

E inf dist^(t/,T9||A’^||*) < + r + 3r(ni + n2 + ns — 3r) + r{nin2 + n2n^ + nina) 

T^O 

-r‘^{ni + 712 + n^)- 


If the tensor is cubic, i.e. rzi = n for i = I, 2, 3, then we have with at least probability 1 — e * 
that 


\\X-X*\\f < 


_2^_ 

[Vto — 1 — (r^ + r + 9r(r7 — r) + 2>rn{n — r)) — t] + 
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